GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper#50224
GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper#50224fangchenli wants to merge 2 commits into
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
There was a problem hiding this comment.
Pull request overview
This PR extends the C++ hash-aggregate Grouper to accept string_view / binary_view group keys (fixing Table.group_by failures on view-typed keys) by adding a dedicated key encoder and routing view types through the non-fast GrouperImpl path.
Changes:
- Add
BinaryViewKeyEncoderto encode/decode view keys using the existing variable-length row format. - Update
RowEncoderandGrouperImplto dispatch*_viewkey types to the new encoder, while ensuringGrouperFastImplrejects view keys. - Add unit tests (row encoder + grouper) and update benchmarks to generate view-typed inputs appropriately.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/src/arrow/compute/row/row_encoder_internal.h | Declares BinaryViewKeyEncoder for view key encoding/decoding. |
| cpp/src/arrow/compute/row/row_encoder_internal.cc | Implements BinaryViewKeyEncoder and dispatches view types in RowEncoder::Init. |
| cpp/src/arrow/compute/row/row_encoder_internal_test.cc | Adds round-trip tests for array/scalar view key encoding. |
| cpp/src/arrow/compute/row/grouper.cc | Routes view keys to BinaryViewKeyEncoder; makes fast path reject view keys. |
| cpp/src/arrow/compute/row/grouper_test.cc | Adds functional coverage for view keys (consume/lookup/uniques + randomized tests). |
| cpp/src/arrow/compute/row/grouper_benchmark.cc | Generates view inputs via cast to benchmark the fallback path. |
Rationale for this change
Grouper rejected
string_viewandbinary_viewgroup keys withNotImplemented: Keys of type string_view, soTable.group_byon a view-typed key failed.What changes are included in this PR?
BinaryViewKeyEncoderinrow_encoder_internal.{h,cc}. It encodes view keys using the existing variable-length row format and decodes them into a fresh view array of the original type.GrouperImpl::Makedispatches view keys to the new encoder.GrouperFastImpl::CanUserejects view keys, so they fall back to GrouperImpl.Are these changes tested?
Yes, unit tests added.
Are there any user-facing changes?
Yes.
Table.group_bynow acceptsstring_viewandbinary_viewgroup keys